Random centroid initialization for improving centroid-based clustering
نویسندگان
چکیده
A method for improving centroid-based clustering is suggested. The improvement built on diversification of the k-means++ initialization. algorithm claimed to be a better version k-means tested by computational set-up, where dataset size, number features, and clusters are varied. statistics obtained testing have shown that, in roughly 50 % instances cluster, outputs worse results than with random centroid impact initialization solidifies as both size features increase. In order reduce possible underperformance k-means++, run separate processor core parallel running algorithm, whereupon result selected. runs set not less that k-means. By incorporating seeding initialization, gains about 0.05 accuracy every second instance cluster.
منابع مشابه
Pseudo-centroid clustering
Pseudo-Centroid Clustering replaces the traditional concept of a centroid expressed as a center of gravity with the notion of a pseudo-centroid (or a coordinate free centroid) which has the advantage of applying to clustering problems where points do not have numerical coordinates (or categorical coordinates that are translated into numerical form). Such problems, for which classical centroids ...
متن کاملNovel centroid selection approaches for KMeans-clustering based recommender systems
Recommender systems have the ability to filter unseen information for predicting whether a particular user would prefer a given item when making a choice. Over the years, this process has been dependent on robust applications of data mining and machine learning techniques, which are known to have scalability issues when being applied for recommender systems. In this paper, we propose a k-means ...
متن کاملDensity-Based Centroid Approximation for Initializing Iterative Clustering Algorithms
We present KDI (Kernel Density Initialization), a density-based procedure for approximating centroids for the initialization step of iteration-based clustering algorithms. We show empirically that a rather low number of distance calculations in conjunction with a fast algorithm for nding the highest peaks are suucient for eeectively and eeciently nding a pre-speciied number of good centroids, w...
متن کاملUsing Class Frequency for Improving Centroid-based Text Classification
Most previous works on text classification, represented importance of terms by term occurrence frequency (tf) and inverse document frequency (idf). This paper presents the ways to apply class frequency in centroid-based text categorization. Three approaches are taken into account. The first one is to explore the effectiveness of inverse class frequency on the popular term weighting, i.e., TFIDF...
متن کاملImproving Centroid-based Text Classification Using Term-distribution-based Weighting System and Clustering
Centroid-based text classification is one of the most popular supervised approaches to classify texts into a set of pre-defined classes with relatively low computation. Based on the vector-space model, the performance of this classification particularly depends on the way to weigh terms in documents in order to construct a representative class vector for each class and degree of spherical shape...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Decision Making
سال: 2023
ISSN: ['2560-6018', '2620-0104']
DOI: https://doi.org/10.31181/dmame622023742